Accurate prediction by AlphaFold2 for ligand binding in a reductive dehalogenase and implications for PFAS (per- and polyfluoroalkyl substance) biodegradation

Despite the success of AlphaFold2 (AF2), it is unclear how AF2 models accommodate for ligand binding. Here, we start with a protein sequence from Acidimicrobiaceae TMED77 (T7RdhA) with potential for catalyzing the degradation of per- and polyfluoroalkyl substances (PFASs). AF2 models and experiments identified T7RdhA as a corrinoid iron-sulfur protein (CoFeSP) which uses a norpseudo-cobalamin (BVQ) cofactor and two Fe4S4 iron-sulfur clusters for catalysis. Docking and molecular dynamics simulations suggest that T7RdhA uses perfluorooctanoic acetate (PFOA) as a substrate, supporting the reported defluorination activity of its homolog, A6RdhA. We showed that AF2 provides processual (dynamic) predictions for the binding pockets of ligands (cofactors and/or substrates). Because the pLDDT scores provided by AF2 reflect the protein native states in complex with ligands as the evolutionary constraints, the Evoformer network of AF2 predicts protein structures and residue flexibility in complex with the ligands, i.e., in their native states. Therefore, an apo-protein predicted by AF2 is actually a holo-protein awaiting ligands.


Contents of the Supplementary Information
. The numbers in the figure show the total vertices of the clusters in different colors; 10 clusters were identified using a fast-greedy algorithm and T7RdhA was located in the cluster shown in grey (126 proteins). (b) The cluster that includes T7RdhA; T7RdhA is highlighted as a large red vertex. (c) The largest clique containing T7RdhA (red) has 47 proteins. The reductive dehalogenase from Lokiarchaeota archeon (GenBank ID: TFF69373.1, blue vertex) was observed in this clique. (d) AF2 structures from proteins with full sequences (39 proteins) in the clique (c) indicate two branches of structures that resemble PceA (red spheres) and NpRdhA (blue squares). The phylogenetic tree is based on the RMSD measured between all pairs of protein structures. T7RdhA is located in the PceA branch. The two proteins with known X-ray crystallographic structures (PceA: PDB 4UQU and NpRdhA: PDB 4RAS) are also shown in the tree.   Fig. S1 indicates two conserved Fe4S4-binding motifs: motif 1 is (F)CX2CX2CX3C(P) (red box) and motif 2 is CX10-12CX2CX3C(P) (gold box); both motifs are also conserved in the PceA and NpRdhA reductive dehalogenases. Fe4S4-1 is bound by C296, C299, C302 from motif-1 and C357 from motif-2; however, Fe4S4-2 is bound by C339, C350, C353 from motif-2 and C306 from motif-1, respectively. The T7RdhA numbering is used. Besides the conserved Cys residues, three fully conserved Trp residues are also labeled in the weblogo 2 plot. The binding of each Fe4S4 clusters require the participation of both binding motifs.

Plasmid design and construct
T7RdhA plasmid was constructed by subcloning the gene block of T7RdhA (ordered from IDT) into pET15b vector using a single NdeI restriction site providing histidine tag on the N terminus of the T7RdhA gene. Constructed plasmid were cloned in DH5α competent cells followed by maxi preps and further subcloned into E. coli BL21 DE3 cells for protein expressions.

Protein expression and purification
For protein expression one liter of autoclaved LB media was inoculated with BL21 DE3 cells that were transformed using pET15b-T7RdhA construct. Cells were grown at 37 ºC in two liters baffled flasks under continuous shake at 150 rpm in presence of 100 µg/mL ampicillin. Once cell culture reached an optical density (OD600nm) of 0.6-0.8, expression of the gene of interest was induced with 0.4 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) for 18 h at 37 °C to ensure expression of T7RdhA in inclusion bodies. Pellets from 1 L of induced overnight culture were suspended in 50 ml of lysis buffer (50 mM Tris, pH 8, 100 mM NaCl, 100 μg/mL lysozyme). Lysates were sonicated for 5 minutes at 45% power (15sec on, 15sec off) and centrifuged (10,000 rpm, 10 min). After centrifugation, the insoluble pellet containing inclusion bodies was washed twice with 50 mM Tris-HCl, pH 8, 150 mM NaCl, and 1% Triton followed by one wash with 50 mM Tris-HCl, pH 8, and 150 mM NaCl. The pellet was then resuspended in a denaturing buffer (50 mM Tris-HCl, pH 8 and 7.5 M Guanidine hydrochloride) followed by purification of T7RdhA on IMAC column. The purified protein was ethanol precipitated and resuspended in Tris-NaCl-Urea buffer for anaerobic binding to iron sulfide cluster and cobalamin as was described by Nakamura et al. 3 After binding to cofactors under denaturing condition, T7RdhA was refolded and dialyzed using 3KDa Dialysis cassette and refolding buffer (50 mM Tris-HCl pH 8.0, 0.5 M NaCl, 20% glycerol, 0.2% CHAPS, 1 mM PMSF, and 10 mM DTT). Refolded protein was stored at -80 ºC until use.

Densitometry
Purified proteins were quantified by densitometry on PAGE gels. Proteins were run on 12% SDS-PAGE gels followed by overnight staining with Coomassie brilliant blue ( Figure S3). Gels were destained, imaged, and protein concentrations were calculated based on BSA standards run on the gel.

UV-Visible spectroscopy
Evaluation of cofactor binding to protein was performed by analyzing 2 μl of protein sample on NanoDrop™ One/OneC Microvolume UV-Vis Spectrophotometer and running the full spectra scan.
To validate our structural prediction for the cofactor binding, we cloned and expressed the T7RdhA protein in Escherichia coli as an N-terminal His-tagged construct (His-T7RdhA). Under the anaerobic conditions the denatured His-T7RdhA was purified and put in a refolding buffer for cobalamin and iron-sulfur cluster binding described by Nakamura et al. 3 The binding of the cobalamin cofactor and Fe4S4 clusters were verified by the UV-Vis spectra as shown in Figure S3.   Figure S6. Two selected cases for contact map using conventional C  -C  distance map. (a) The C  -C  distance between R395 and E79 is longer than 8 Å but these two residues form a strong saltbridge interaction. (b) The C  -C  distance between R70 and D66 is shorter than 8 Å but these two residues do not interact. The final snapshot of system 1 after 300 ns MD simulation is used for both cases. The structures are plotted using VMD (ref. 66 in main text).